Towards Online Concept Drift Detection with Feature Selection for Data Stream Classification
نویسندگان
چکیده
Data Streams are unbounded, sequential data instances that are generated very rapidly. The storage, querying and mining of such rapid flows of data is computationally very challenging. Data Stream Mining (DSM) is concerned with the mining of such data streams in real-time using techniques that require only one pass through the data. DSM techniques need to be adaptive to reflect changes of the pattern encoded in the stream (concept drift). The relevance of features for a DSM classification task may change due to concept drifts and this paper describes the first step towards a concept drift detection method with online feature tracking capabilities.
منابع مشابه
Detecting Concept Drift in Data Stream Using Semi-Supervised Classification
Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...
متن کاملOnline Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features
Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...
متن کاملHandling Gradual Concept Drift in Stream Data
Data streams are sequence of data examples that continuously arrive at time-varying and possibly unbound streams. These data streams are potentially huge in size and thus it is impossible to process many data mining techniques (e.g., sensor readings, call records, web page visits). Tachiniques for classification fail to successfully process data streams because of two factors: their overwhelmin...
متن کاملFeature Based Data Stream Classification (FBDC) and Novel Class Detection
Data stream classification poses many challenges to the data mining community. Here this paper solves all the challenges such as infinite length, concept-drift, concept-evolution, and feature-evolution. Since a data stream is theoretically infinite in length, it is impractical to store and use all the historical data for training. Concept-drift is a common phenomenon in data streams, which occu...
متن کاملClassification and Novel Class Detection of Data Streams in a Dynamic Feature Space
Data stream classification poses many challenges, most of which are not addressed by the state-of-the-art. We present DXMiner, which addresses four major challenges to data stream classification, namely, infinite length, concept-drift, concept-evolution, and featureevolution. Data streams are assumed to be infinite in length, which necessitates single-pass incremental learning techniques. Conce...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016